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CLAIMS : 

What is claimed is: 

5 1. A method in a data processing system for isolating 
failing hardware in the data processing system, the 
method comprising: 

responsive to detecting a recovery attempt from an 
error for an operation involving a hardware component, 
10 storing an indication of the attempt; and 

responsive to the error exceeding a threshold, 
placing the hardware component in an unavailable state. 

2. The method of claim 1 further comprising: 

15 clearing the unavailable state of the hardware 

component in response to a hot-plug action replacing the 
hardware component . 

3. The method of claim 1, wherein the placing step 
20 comprises : 

making a call to a hardware interface layer to place 
the hardware component into a permanent reset state. 

4. The method of claim 1, wherein the indication is 
25 stored in an error log. 

5. The method of claim 1 further comprising: 
responsive to a selected number of recovery attempts 

occurring, recreating the error. 

30 

6. The method of claim 1, wherein the error is an error 
caused by a PCI bus operation. 
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7. The method of claim 1, wherein the detecting and 
placing steps occur in a firmware layer within the data 
processing system . 

5 

8. The method of claim 1, wherein the detecting step 
occurs in a device driver and placing steps occurs in a 
firmware ♦ 

10 9. The method of claim 1, wherein the threshold is the 
error successively a selected number of times. 

10. A method in a data processing system for handling 
errors, the method comprising: 

15 responsive to an occurrence of an error, determining 

whether the error is a recoverable error; 

responsive to a determination that the error is a 

recoverable error, identifying slots on the bus 

indicating an error state; 
20 incrementing an error counter for each identified 

slot; and 

responsive to the error counter exceeding a 
threshold, placing the slot into a permanently 
unavailable state. 

25 

11. The method of claim 10 further comprising: 
responsive to the error counter failing to exceed 

the threshold, placing the slot into an available state, 
wherein a device within the slot resumes functioning. 
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12. A data processing system comprising: 
a bus system; 

a communications unit connected to the bus system; 

a memory connected to the bus system, wherein the 
5 memory includes as set of instructions; and 

a processing unit connected to the bus system, 
wherein the processing unit executes the set of 
instructions to store an indication of a recovery attempt 
from an error in response to detecting the recovery 
10 attempt; and place the hardware component in an 

unavailable state in response to the error exceeding a 
threshold. 



13. A data processing system comprising: 
15 a bus system; 

a communications unit connected to the bus system; 

a memory connected to the bus system, wherein the 
memory includes as set of instructions; and 

a processing unit connected to the bus system, 
20 wherein the processing unit executes the set of 
instructions to determine whether the error is a 
recoverable error in response to an occurrence of an 
error; identify slots on the bus indicating an error 
state in response to a determination that the error is a 
25 recoverable error; increment an error counter for each 
identified slot; and place the slot into a permanently 
unavailable state in response to the error counter 
exceeding a threshold. 
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14. A data processing system for isolating failing 
hardware in the data processing system, the data 
processing system comprising: 

storing means, responsive to detecting a recovery 
5 attempt from an error, for storing an indication of the 
attempt; and 

placing means, responsive to the error occurring in 
the more than a threshold for a hardware component, for 
placing the hardware component in an unavailable state. 

10 

15. The data processing system of claim 14 further 
comprising : 

clearing means for clearing the unavailable state of 
the hardware component in response to a hot-plug action , 
15 replacing the hardware component. 

16. The data processing system of claim 14, wherein the 
placing means comprises: 

means for making a call to a hardware interface 
20 layer to place the hard ware component into a permanent 
reset state. 

17. The data processing system of claim 14, wherein the 
indication is stored in an error log. 

25 

18. The data processing system of claim 14 further 
comprising : 

recreating means, responsive to a selected number of 
recovery attempts occurring, for recreating the error. 

30 

19. The data processing system of claim 14, wherein the 
error is an error caused by a PCI bus operation. 
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20. The data processing system of claim 14, wherein the 
detecting means and the placing means are located in a 
firmware layer within the data processing system. 

5 

21. The data processing system of claim 14, wherein the 
detecting means is located in a device driver and the 
placing means is located in a firmware. 

10 22. The data processing system of claim 14, wherein the 
threshold is the error successively a selected number of 
times . 

23. A data processing system for handling errors, the 
15 data processing system comprising: 

determining means, responsive to an occurrence of an 
error, for determining whether the error is a recoverable 
error; 

identifying means, responsive to a determination 
20 that the error is a recoverable error, for identifying 
slots on the bus indicating an error state; 

incrementing means for incrementing an error counter 
for each identified slot; and 

placing means, responsive to the error counter 
25 exceeding a threshold, for placing the slot into a 
permanently unavailable state. 

24. The data processing system of claim 23, wherein the 
placing means is a first placing means and further 

30 comprising : 
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second placing means, responsive to the error 
counter failing to exceed the threshold, for placing the 
slot into an available state, wherein a device within the 
slot resumes functioning. 

5 

25. A computer program product in a computer readable 
medium for isolating failing hardware in the data 
processing system, the computer program product 
comprising : 

10 first instructions, responsive to detecting a 

recovery attempt from an error, for storing an indication 

of the attempt; and 

second instructions, responsive to the error 

occurring in the more than a threshold for a hardware 
15 component, for placing the hardware component in an 

unavailable state . 

26. The computer program product of claim 25 further 
comprising: 

20 third instructions for clearing the unavailable 

state of the hardware component in response to a hot-plug 
action replacing the hardware component. 

27. The computer program product of claim 25, wherein 
25 the placing step comprises: 

third instructions for making a call to a hardware 
interface layer to place the hard ware component into a 
permanent reset state. 

30 28. The computer program product of claim 25, wherein 
the indication is stored in an error log. 
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29. The computer program product of claim 25 further 
comprising : 

third instructions, responsive to a selected number 
of recovery attempts occurring, for recreating the error. 

5 

30. The computer program product of claim 25, wherein 
the error is an error caused by a PCI bus operation. 

31. The computer program product of claim 25, wherein 

10 the detecting and placing steps occur in a firmware layer 
within the data processing system. 

32. The computer program product of claim 25, wherein 
the detecting step occurs in a device driver and placing 

15 steps occurs in a firmware. 

33. The computer program product of claim 25, wherein 
the threshold is the error successively a selected number 
of times. 

20 

34. A computer program product in a computer readable 
medium for handling errors, the computer program product 
comprising: 

first instructions, responsive to an occurrence of 
25 an error, for determining whether the error is a 
recoverable error; 

second instructions, responsive to a determination 
that the error is a recoverable error, for identifying 
slots on the bus indicating an error state; 
30 third instructions for incrementing an error counter 

for each identified slot; and 

fourth instructions, responsive to the error counter 
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exceeding a threshold, for placing the slot into a 
permanently unavailable state* 

35. The computer program product of claim 34 further 
comprising : 

fifth instructions, responsive to the error counter 
failing to exceed the threshold, for placing the slot 
into an available state, wherein a device within the slot 
resumes functioning . 



